Full duplicate candidate pruning for frequent connected subgraph mining

نویسندگان

  • Andrés Gago Alonso
  • Jesús Ariel Carrasco-Ochoa
  • José Eladio Medina-Pagola
  • José Francisco Martínez Trinidad
چکیده

Support calculation and duplicate detection are the most challenging and unavoidable subtasks in frequent connected subgraph (FCS) mining. The most successful FCS mining algorithms have focused on optimizing these subtasks since the existing solutions for both subtasks have high computational complexity. In this paper, we propose two novel properties that allow removing all duplicate candidates before support calculation. Besides, we introduce a fast support calculation strategy based on embedding structures. Both properties and the new embedding structure are used for designing two new algorithms: gdFil for mining all FCSs; and gdClosed for mining all closed FCSs. The experimental results show that our proposed algorithms get the best performance in comparison with other well known algorithms. ∗Corresponding author

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Duplicate Candidate Elimination and Fast Support Calculation for Frequent Subgraph Mining

Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real life. Most of the previous studies are focused on pruning search subspaces or optimizing the subgraph isomorphism (SI) tests. In this paper, a new property to remove all duplicate candidates in FCSM during the enumeration is introduced. Based on this property, a new FCSM algorithm called gdFil is pro...

متن کامل

Mining for Unconnected Frequent Graphs with Direct Subgraph Isomorphism Tests

In the paper we propose the algorithm which discovers both connected and unconnected frequent graphs from the graphs set. Our approach is based on depth first search candidate generation and direct execution of subgraph isomorphism test over database. Several search space pruning techniques are also proposed. Due to lack of unconnected graph mining algorithms we compare our algorithm with two g...

متن کامل

A Closed Frequent Subgraph Mining Algorithm in Unique Edge Label Graphs

Problems such as closed frequent subset mining, itemset mining, and connected tree mining can be solved in a polynomial delay. However, the problem of mining closed frequent connected subgraphs is a problem that requires an exponential time. In this paper, we present ECE-CloseSG, an algorithm for finding closed frequent unique edge label subgraphs. ECE-CloseSG uses a search space pruning and ap...

متن کامل

A new algorithm for mining frequent connected subgraphs based on adjacency matrices

Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting duplicate candidates using canonical form (CF) tests. CF tests have high computational complexity, which affects the efficiency of graph miners. In this paper, we introduce novel properties of the canonical adjacency matrices for reducing the number of CF tests in FCSM. Based on these properties, a n...

متن کامل

A Two-Phase Algorithm for Differentially Private Frequent Subgraph Mining

Mining frequent subgraphs from a collection of input graphs is an important task for exploratory data analysis on graph data. However, if the input graphs contain sensitive information, releasing discovered frequent subgraphs may pose considerable threats to individual privacy. In this paper, we study the problem of frequent subgraph mining (FSM) under the rigorous differential privacy model. W...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Integrated Computer-Aided Engineering

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2010